Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export fp8 te nemo to trt-llm #10096

Merged
merged 27 commits into from
Aug 29, 2024

Conversation

Laplasjan107
Copy link
Contributor

@Laplasjan107 Laplasjan107 commented Aug 9, 2024

What does this PR do ?

Add support for exporting FP8 TE NeMo to TRT LLM.

Collection: nlp

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
from nemo.export import TensorRTLLM

trt_llm_exporter = TensorRTLLM(model_dir="/opt/checkpoints/tmp_triton_model_repository/")
trt_llm_exporter.export(nemo_checkpoint_path="/opt/checkpoints/GPT-2B-001_bf16_tp1.nemo", model_type="gptnext", n_gpus=1)

# Autodetects when quantisation flags are not set
output = trt_llm_exporter.forward(["What is the best city in the world?"], max_output_token=17, top_k=1, top_p=0.0, temperature=1.0, fp8_quantized=True,  fp8_kvcache=True)
print("output: ", output)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Piotr Kaminski added 3 commits August 14, 2024 14:34
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Piotr Kaminski added 2 commits August 14, 2024 14:46
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Piotr Kaminski and others added 4 commits August 16, 2024 09:14
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Laplasjan107 and others added 10 commits August 19, 2024 14:47
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Copy link
Collaborator

@oyilmaz-nvidia oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please remove the parts that are not related to the fp8 ckpt support? I saw some of the values in the config removed. We are in the process of moving main parts of the export into mcore and some of the code optimizations were done in the mcore already.

nemo/export/tensorrt_llm.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_converter.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_to_trt_llm_ckpt.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_to_trt_llm_ckpt.py Outdated Show resolved Hide resolved
JimmyZhang12
JimmyZhang12 previously approved these changes Aug 22, 2024
Copy link
Collaborator

@JimmyZhang12 JimmyZhang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general most of the changes are very much well needed code cleanups that had needed to happen for a while.
@shanmugamr1992 is simultaneously working to port this code to megatron core, and he also has many changes to cleanup the code. As @oyilmaz-nvidia has mentioned, we should coordinate with him to see which of these are needed to land in this PR For instance he also refactors the giant messy weight name dict here.

Can we make sure these refactors do not break the NeMo-Aligner's existing code path? I believe the CI isn't running for aligner. cc @terrykong

nemo/export/tensorrt_llm.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_converter.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_to_trt_llm_ckpt.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_converter.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/converter/model_converter.py Outdated Show resolved Hide resolved
@terrykong
Copy link
Collaborator

@JimmyZhang12 I'm not sure I will be able to test this, but I'd like to try to manually test the NeMo-Aligner TRTLLM integration with this PR. It looks like this PR is rooted after @oyilmaz-nvidia upgraded to v11, so I can try a one-off build of (TRTLLM v10) + (this Nemo PR) + (mcore ToT) and hopefully with a few hacks I can validate

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking until I can verify aligner doesn't obviously break (ETA this week)

Piotr Kaminski and others added 3 commits August 27, 2024 07:49
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Piotr Kaminski and others added 3 commits August 28, 2024 01:09
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a smoke test, the Aligner code path looks okay

@janekl janekl merged commit 9796b69 into NVIDIA:main Aug 29, 2024
128 checks passed
Edresson pushed a commit that referenced this pull request Aug 29, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* PR draft

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* fixed scaling weights

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* fixed zarr loading, added flags, refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix expert key mapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix: failed test was finishing with exit code 0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* test commit -- rerun github checks

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix: naming

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix v2: naming

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* apply code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix TensorRTLLM build (fp8 still not supported)

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* undo refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix: arguments to dist_convert

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
adityavavre pushed a commit to adityavavre/NeMo that referenced this pull request Sep 15, 2024
* initial commit

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* PR draft

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* fixed scaling weights

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* fixed zarr loading, added flags, refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix expert key mapping

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix: failed test was finishing with exit code 0

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* test commit -- rerun github checks

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix: naming

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix v2: naming

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* apply code review changes

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* fix TensorRTLLM build (fp8 still not supported)

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

* undo refactor

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* bugfix: arguments to dist_convert

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>

---------

Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: adityavavre <aditya.vavre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants